Skip to content

Conversation

@fsk119
Copy link
Member

@fsk119 fsk119 commented Sep 25, 2025

What is the purpose of the change

Support to parse VECTOR_SEARCH function.

Brief change log

  • Add SqlVectorSearchTableFunction
  • Add type inference for the newly added function
  • Set the correct scope for VECTOR_SEARCH function

Verifying this change

This change added tests and can be verified as follows:

  • Added plan tests for VECTOR_SEARCH function

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Sep 25, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

* vectors during runtime.
*
* <p>Compared to {@link ScanTableSource}, the source does not have to read the entire table and can
* lazily fetch individual values from a (possibly continuously changing) external table when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to compare it to a LookupTableSource also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this part in the document.

PARAM_SEARCH_TABLE,
PARAM_COLUMN_TO_SEARCH,
PARAM_COLUMN_TO_QUERY,
PARAM_TOP_K));
Copy link
Contributor

@davidradl davidradl Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I an see what TOP_K is from googling. It would be useful to add in the documentation describing the parameters with this change - including the default. I wonder if we should add paging, to be able to handle a large number of results or is this not done with vector databases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, the document will be added when all tasks are finished. The current API doesn't support paging, I think we can leave this as the future work.

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Sep 25, 2025
Copy link
Contributor

@lihaosky lihaosky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fsk119 ! Left some comments

throw new ValidationException(
"The query column is not literal, please use LATERAL TABLE to run VECTOR_SEARCH.");
}
SqlValidatorScope scope = getSelectScope((SqlSelect) binding.operand(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks there's no need to cast based on line 2618?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires. Because line 2618 uses SqlCall to get operand(its return type is <S extends SqlNode> S ), but here we uses SqlCallBinding to get operand(its return type is SqlNode). Therefore, we still need cast here. 0.0

We can not use SqlCall to extract operand because VECTOR_SEARCH allows user to use named argument, which means the operands is out of order and we need to use the name to reorder the operands.

private static final String PARAM_SEARCH_TABLE = "SEARCH_TABLE";
private static final String PARAM_COLUMN_TO_SEARCH = "COLUMN_TO_SEARCH";
private static final String PARAM_COLUMN_TO_QUERY = "COLUMN_TO_QUERY";
private static final String PARAM_TOP_K = "TOP_K";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the optional config param from FLIP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But I plan to add this in the https://issues.apache.org/jira/browse/FLINK-38430

Collections.singletonList(SqlKind.SELECT)))) {
boolean queryColumnIsNotLiteral =
binding.operand(2).getKind() != SqlKind.LITERAL;
if (!queryColumnIsNotLiteral && !lateral) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't lateral always needed? What's the syntax for literal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LATERAL is not always needed if the SqlCall doesn't contain correlation. For exmaple, users can use the following statement to search.

SELECT * FROM TABLE(VECTOR_SEARCH(TABLE VectorTable, DESCRIPTOR(`g`), ARRAY[1.5, 2.0], 10))

Here, the query input is ARRAY[1.5, 2.0].

CC

"SELECT * FROM TABLE(VECTOR_SEARCH(TABLE VectorTable, DESCRIPTOR(`g`), ARRAY[1.5, 2.0], 10))";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's the syntax but your test expects an exception, so it's not a valid sql?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a valid sql. But we don't add literal related rule in physical phase, so planner can not translate the sql correctly. But the exception indicates the planner can parse the statement correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I guess it's more clear to have something like https://github.com/apache/flink/pull/26553/files#diff-19970e8600e459e820e1310beed925a10450f695698257d85648e8114b5e5aaeR92 to indicate it's not invalid case.

@fsk119 fsk119 merged commit 56bf7c8 into apache:master Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants